Respect options from baseline #124

killuazhu · 2019-02-01T14:44:07Z

Fully implements the proposal in #121 (comment). More than half of the changes are new test cases.

Some logic can be implemented in different places, I chose the current way based on my understanding. Let me know if you have any suggetsions or comments.

FYI @KevinHock @domanchi
CC @jribm

domanchi

Great test cases! Fix and ship!

domanchi · 2019-02-01T18:12:19Z

detect_secrets/main.py

+            if args.import_filename:
+                plugins = initialize.merge_plugin_from_baseline(
+                    _get_plugin_from_baseline(args.import_filename), args,
+                )


Let's move this block inside _perform_scan, right after we get the old_baseline.

e.g.

def _perform_scan(args, plugins): old_baseline = _get_existing_baseline(args.import_filename) if old_baseline: plugins = initialize.merge_plugin_from_baseline(...)

This way, we don't need to incur two file reads. Also, I'm not entirely sure that reading from stdin would work twice.

domanchi · 2019-02-01T18:17:06Z

tests/main_test.py

+            ) == 0
+
+            print("Used:", file_writer.call_args[1]['data']['plugins_used'])
+            print("Wrote:", plugins_wrote)


Let's remove these print statements, before merging.

domanchi · 2019-02-01T18:18:16Z

tests/pre_commit_hook_test.py

@@ -44,17 +44,53 @@ def test_file_with_secrets(self, mock_log):
    def test_file_no_secrets(self):
        assert_commit_succeeds('test_data/files/file_with_no_secrets.py')

-    def test_baseline(self):
+    @pytest.mark.parametrize(
+        ' has_result, use_private_key_scan,, hook_command, commit_result',


extra comma? maybe s/commit_result/commit_succeeds? (also prefixed space, but less fussy)

yep, extra comma. The rename sounds good. I will update in a new commit.

domanchi · 2019-02-01T18:33:35Z

tests/main_test.py

+                    },
+                ],
+            ),
+            (  # ignore overwriten option from CLI when not using --use-all-plugins


Hmm. If you explicitly override it by providing the CLI argument, even without --use-all-plugins, wouldn't you expect CLI to take precedence over a configured baseline?

Example use case: user wants to see if a decrease in sensitivity results in a large drop in noise.

$ detect-secrets scan --update .secrets.baseline --base64-limit 5 $ git diff .secrets.baseline

I debated about that, it might have some edge cases. For example, if the baseline does not have the base64 scan, do we ignore the limit? or add the base64 scan in? So I chose to let the user on purposely use the plugin, then allow adjusting. How would you suggest to adjust?

It's a good point you brought up.

What about displaying a warning when they don't have the plugin installed, then ignoring it? Seems to accomplish several things:

Allows CLI arguments to configure limits, when they are relevant.

Ignores CLI arguments when not relevant.

Informs the user that arguments passed are redundant, yet not hard blocking due to poor invocation of command.

$ detect-secrets scan --update .secrets.baseline --base64-limit 5 WARN: --base64-limit specified, but Base64HighEntropyString not configured! Ignoring...

It's updated to support these cases now. New test cases added.

KevinHock

looks great so far

KevinHock · 2019-02-01T19:33:34Z

detect_secrets/plugins/common/initialize.py

+    return plugins
+
+
+def _merge_plugin_from_baseline(baseline_plugins, args):


Maybe rename to _trim_disabled_plugins_from_baseline, it would make the make the comments in merge_plugin_from_baseline less needed.

KevinHock · 2019-02-01T19:34:07Z

detect_secrets/plugins/common/initialize.py

+
+
+def _merge_plugin_from_baseline(baseline_plugins, args):
+    merged_plugins_dict = {vars(plugin)['name']: plugin for plugin in baseline_plugins}


Great use of vars 👍

Credit to @meneal

killuazhu · 2019-02-05T07:09:58Z

@domanchi ready for another round.

domanchi

Looking good! Thanks for writing that hairy logic of plugin prioritization.

domanchi · 2019-02-05T22:18:00Z

detect_secrets/plugins/common/initialize.py

+
+    # input param priority > baseline
+    input_plugins_dict = dict(args.plugins)
+    for plugin_name, plugin_params in list(input_plugins_dict.items()):


We shouldn't need to turn input_plugins_dict.items() into a list. It gives us an iterator, so we get some performance boosts by avoiding turning it into a list, before iterating through it.

Same goes for other uses of this behavior.

domanchi · 2019-02-05T22:24:40Z

detect_secrets/plugins/common/initialize.py

+    if args.disabled_plugins:
+        for plugin_name in args.disabled_plugins:
+            if plugin_name in plugins_dict:
+                plugins_dict.pop(plugin_name)


plugins_dict = { plugin_name: plugin_params for plugin_name, plugin_params in baseline_plugins_dict.items() if plugin_name not in args.disabled_plugins }

domanchi · 2019-02-05T22:46:53Z

detect_secrets/plugins/common/initialize.py

+                    log.warning(
+                        '%s specified, but %s not configured! Ignoring...'
+                        % ("".join(["--", param_name.replace("_", "-")]), plugin_name),
+                    )


Can we make this DRYer, seeing that we use it below as well? One idea might be to implement an iterator, or just encapsulate it in a function. For example:

def get_prioritized_parameters(plugins, is_default_plugins_map, prefer_default=True): """ :type is_default_plugins_map: dict(str => bool) :param is_default_plugins_map: mapping of parameter name to whether its value is derived from a default value. :param prefer_default: if True, will yield if plugin parameters are from default values. Otherwise, will yield if plugin parameters are *not* from default values. """ ... yield plugin_name, param_name, param_value

Then, you can do:

plugins_dict = dict(args.plugins) for plugin_name, param_name, param_value in get_prioritized_parameters( input_plugins_dict, args.param_from_default, prefer_default=False, ): try: plugins_dict[plugin_name][param_name] = param_value except KeyError: ...

It's been updated now. Although having more lines, but I do think the logic is a little bit more clear.

Clearer logic >> more lines, especially when it comes to long term maintenance 😸

domanchi · 2019-02-05T22:54:59Z

detect_secrets/core/usage.py

@@ -309,6 +309,7 @@ def consolidate_args(args):

        active_plugins = {}


Please put a comment here that explains the rationale for why we're returning these new values.

I understand the purpose of doing it, and that there's really no better way to incorporate the baseline values in it. It's just unfortunate that this hairy logic to perform prioritization of plugins needs to live in two places (unlike this doc string suggests).

disabled_plugins is mainly used to filter off some disabled plugins when reading the plugin list from baseline. Relates to the comment below, it's possible to generate the disabled_plugins list on-demand by calculating the difference between all plugins and active plugins. I can add a helper function to the usage.py.

I was using disabled_plugins earlier to avoid re-iterating on the all plugins to get such list.

Gotcha. Makes sense.

Thanks for adding the helper function anyway!

domanchi · 2019-02-05T23:02:59Z

detect_secrets/core/usage.py

@@ -309,6 +309,7 @@ def consolidate_args(args):

        active_plugins = {}
        disabled_plugins = {}


Now that I think about it, what is our purpose of returning a set of disabled_plugins? Since args.plugins is a collection of all active plugins (combining default values and specifically set values), wouldn't any plugin that isn't in this dictionary be disabled?

See comment above.

domanchi · 2019-02-05T23:06:00Z

detect_secrets/core/usage.py

@@ -309,6 +309,7 @@ def consolidate_args(args):

        active_plugins = {}
        disabled_plugins = {}
+        param_from_default = {}


Maybe is_default_parameter, so it's clearer that it contains boolean values?

That's a good idea. I would make a suggestion to rename it to is_using_default_value

killuazhu · 2019-02-06T01:21:53Z

Thanks for writing that hairy logic of plugin prioritization.

It definitely makes the job more challenge when I have to convert the BasePlugin tuple into a dictionary, then compares and sets values. I noticed the BasePlugin class has made __dict__ field itself a property instead of the exact properties such as name, base64_limit and etc. This prevents the {set|has|get}attr from reading and writing values. May I ask what was the rationale of overwriting __dict__ instead of directly using property on each field? @domanchi

killuazhu · 2019-02-06T05:02:25Z

@domanchi all comments addressed.

domanchi

May I ask what was the rationale of overwriting __dict__ instead of directly using property on each field?

Seeing that this package is designed with baseline readability in mind, I wanted an easy way for each plugin to output information that:

Was easy for humans to look at, and reason about
Was sufficient information to initialize the plugin from

I think the baseline output seems to achieve those goals.

If I'm interpreting your question correctly, I think for the most part, you can use the property on each field, with the exception that different plugins may have different properties. And I certainly don't have any qualms against something like:

class BasePlugin:
    self.name = self.__class__.__name__

except that I didn't need it at the time :)

KevinHock

Looks awesome 👍

domanchi suggested changes Feb 1, 2019

View reviewed changes

KevinHock reviewed Feb 1, 2019

View reviewed changes

killuazhu force-pushed the contribute-respect-option branch from 9fbeff6 to 4570adf Compare February 5, 2019 06:25

Xianjun Zhu added 5 commits February 5, 2019 01:25

feat: Read plugins list from baseline

9e0b619

test: simplify test cases

17fd458

test: fix test cases for slack

446cb53

fix: address review comments

37d7e06

fix: support param priority

a2f74dd

killuazhu force-pushed the contribute-respect-option branch from 4570adf to a2f74dd Compare February 5, 2019 06:25

fix: add warn message

47b4930

domanchi reviewed Feb 5, 2019

View reviewed changes

Xianjun Zhu added 3 commits February 5, 2019 23:16

fix: remove disabled_plugins

266999a

fix: improve perf

add69bc

fix: extract get prioritized params logic

024ffc4

domanchi approved these changes Feb 6, 2019

View reviewed changes

KevinHock approved these changes Feb 6, 2019

View reviewed changes

KevinHock merged commit 462ade6 into Yelp:master Feb 6, 2019

killuazhu mentioned this pull request Feb 6, 2019

Respect plugin list from baseline #121

Closed

killuazhu deleted the contribute-respect-option branch January 9, 2020 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect options from baseline #124

Respect options from baseline #124

killuazhu commented Feb 1, 2019

domanchi left a comment

domanchi Feb 1, 2019

domanchi Feb 1, 2019

domanchi Feb 1, 2019

killuazhu Feb 1, 2019

domanchi Feb 1, 2019

killuazhu Feb 1, 2019

domanchi Feb 1, 2019

killuazhu Feb 5, 2019

KevinHock left a comment

KevinHock Feb 1, 2019

KevinHock Feb 1, 2019

killuazhu Feb 1, 2019

killuazhu commented Feb 5, 2019

domanchi left a comment

domanchi Feb 5, 2019

domanchi Feb 5, 2019

domanchi Feb 5, 2019

killuazhu Feb 6, 2019

domanchi Feb 6, 2019

domanchi Feb 5, 2019

killuazhu Feb 6, 2019

domanchi Feb 6, 2019

domanchi Feb 5, 2019

killuazhu Feb 6, 2019

domanchi Feb 5, 2019

killuazhu Feb 6, 2019

killuazhu commented Feb 6, 2019

killuazhu commented Feb 6, 2019

domanchi left a comment

KevinHock left a comment

		return plugins


		def _merge_plugin_from_baseline(baseline_plugins, args):



		def _merge_plugin_from_baseline(baseline_plugins, args):
		merged_plugins_dict = {vars(plugin)['name']: plugin for plugin in baseline_plugins}

		@@ -309,6 +309,7 @@ def consolidate_args(args):

		active_plugins = {}

		@@ -309,6 +309,7 @@ def consolidate_args(args):

		active_plugins = {}
		disabled_plugins = {}

Respect options from baseline #124

Respect options from baseline #124

Conversation

killuazhu commented Feb 1, 2019

domanchi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KevinHock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

killuazhu commented Feb 5, 2019

domanchi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

killuazhu commented Feb 6, 2019

killuazhu commented Feb 6, 2019

domanchi left a comment

Choose a reason for hiding this comment

KevinHock left a comment

Choose a reason for hiding this comment